Functional data geometric morphometrics with machine learning for craniodental shape classification in shrews

This work proposes a functional data analysis approach for morphometrics in classifying three shrew species (S. murinus, C. monticola, and C. malayana) from Peninsular Malaysia. Functional data geometric morphometrics (FDGM) for 2D landmark data is introduced and its performance is compared with classical geometric morphometrics (GM). The FDGM approach converts 2D landmark data into continuous curves, which are then represented as linear combinations of basis functions. The landmark data was obtained from 89 crania of shrew specimens based on three craniodental views (dorsal, jaw, and lateral). Principal component analysis and linear discriminant analysis were applied to both GM and FDGM methods to classify the three shrew species. This study also compared four machine learning approaches (naïve Bayes, support vector machine, random forest, and generalised linear model) using predicted PC scores obtained from both methods (a combination of all three craniodental views and individual views). The analyses favoured FDGM and the dorsal view was the best view for distinguishing the three species.

a continuous surface.This approach naturally aligns with the FDA perspective, as elucidated by Ramsay and Silverman 19 .To ensure that the functions are well-aligned for geometric features such as peaks and valleys, curve registration 23,24 or functional alignment 25 are applied to warp the temporal domain of functions 22 .The FDA framework surpasses its counterparts, including both the landmark-based approach and the set theory approach with principal component analysis (PCA), when applied to a well-known database of bone outlines 26 .The set theory approach is adopted from a methodology outlined in Horgan 27 which treats shapes as sets 26 .Each position within the image corresponds to a binary variable, indicating whether it belongs to the shape or not.Consequently, the study performed PCA specifically tailored for binary data 26 .Building on Tian's study of FDA in brain imaging analysis 28 , our research aims to explore FDGM's capacity to enhance sensitivity to subtle shape variations through the analysis of continuous function-based shape changes.This is particularly significant for studying species with minor morphological distinctions or monitoring subtle changes in response to environmental factors.
In our study, we transform landmark data into functional data following generalised Procrustes analysis (GPA).Generalised Procrustes analysis employs rigid transformations, including translation, rotation, and scaling, to align landmark configurations, standardising them for comparison 29 .However, this method may not fully address non-rigid deformations or shape changes independent of position, orientation, or size.Consequently, GPA might not capture all aspects of shape variation, particularly those involving local deformations or complex shapes.To address this limitation, we employ FDA to model non-rigid deformations and intricate shape changes undetected by GPA.By analysing shape changes as continuous functions, FDA can identify and quantify subtle variations and local deformations, offering a more comprehensive understanding of shape variation.Moreover, GPA mandates a one-to-one correspondence between landmarks across specimens, simplifying analysis but potentially overlooking true anatomical correspondence, especially when dealing with ambiguous landmarks 29 .In contrast, FDA relaxes this requirement, aligning shapes based on overall shape curves or surfaces rather than exact landmark correspondences.This allows for more flexible matching of shapes, particularly when landmarks are ambiguous or difficult to identify consistently.
We utilise the functional data to perform multivariate functional principal component analysis (MFPCA) to observe variation among three shrew species, comparing the results with principal component analysis (PCA) in GM.Multivariate functional principal component analysis generates principal component scores (MFPC scores), capturing major sources of shape variation among the species.Landmark data sampled from curves are succinctly represented by continuous curves based on the Karhunen-Loève theorem.Our study demonstrates that FDGM can identify shape differences using classification methods, offering insights into underlying factors such as ecology or behavior.While GM standardises landmark configurations effectively, the integration of FDA enhances morphometric analysis by capturing shape variation more comprehensively and sensitively, especially in complex structures like skulls.
We aim to implement the FDGM framework to observe the existence of significant differences in the craniodental shapes of three species of shrews.We organise our study around the null hypothesis that there are no significant differences in craniodental shapes among the three species of shrews under study.Any observed variations are attributable to random fluctuations or measurement errors, rather than indicative of genuine distinctions related to ecological niches or evolutionary processes.The hypothesis is framed within the framework of traditional morphological analyses, which have long been instrumental in understanding evolutionary relationships and ecological adaptations among shrew species.Shrews, being small mammals with diverse habitats and diets, provide an intriguing subject for morphological investigation.Thus, these craniodental differences can be related to the different ecological niches that these three species occupy 30 .
For our analysis, we collected 89 adult shrew specimens: 29 from S. murinus, 30 from C. monticola, and 30 from C. malayana.The habitats of C. malayana span diverse locations, including Lata Belatan, Terengganu; Ulu Gombak; Aur Island, Johor; Pangkor Island, Perak; Bukit Rengit, Pahang; Cheras Road, Kuala Lumpur; Port Dickson, Negeri Sembilan; and Dusun Tua, Selangor.Conversely, C. monticola exhibits a broader habitat range, inhabiting environments such as Ulu Gombak; Wang Kelian, dominated by secondary lowland forest, and Maxwell Hill, an upper dipterocarp forest, among others.Suncus murinus, on the other hand, is observed in locations like Wang Kelian, Perlis; Alor Setar, Kedah; Air Hitam, Pulau Pinang; Lumut, Perak; Ulu Gombak, Selangor; and Bukit Katil, Melaka.These varied habitats likely contribute to the divergence in craniodental morphology between species.Notably, C. malayana and C. monticola coexist in sympatry in Ulu Gombak, sharing the same habitat or niche.This study aims to elucidate the relationships between these species, offering valuable insights into the evolutionary processes shaping their craniodental morphology.
Morphometric studies for classification and identification tasks are enhanced by extensive machine learning methods 31 .The naive Bayes (NB), support vector machine (SVM), random forest (RF), and generalised linear models (GLM) classification models 32 are frequently applied because they have been successfully used in many previous studies.In Rodrigues et al., NB was the best classifier for detecting landmarks in automatic wing geometric morphometrics classification of honeybee (Apis mellifera) subspecies 33 .Thomas et al. also applied the NB classifier in their study to propose a novel approach in GM to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias 34 which indicates that NB can be a potentially valuable tool for classification and pattern recognition tasks based on shape data.Bellin et al. successfully combined geometric morphometrics with different machine learning algorithms, including SVM with radial basis function (RBF) kernel.This study demonstrated the effectiveness of SVM in correctly classifying two Anopheles sibling species of the Maculipennis complex based on shape data 35 .Hence, this study aims to incorporate supervised learning, particularly SVM, for the classification of three shrew species based on their morphological features.SVM can be used to classify shapes into different categories based on their landmark coordinates or shape descriptors.In this approach, each classifier separates the points of two different species and combines all one-vs-one classifiers which leads to a multiclass classifier.www.nature.com/scientificreports/Arai et al. applied RF in the context of morphological identification in skulls, specifically between spotted seals and harbour seals, using geometric morphometrics.The study achieved an identification accuracy rate of 100% using RF by narrowing down to a subset of eight key landmarks out of a total of 75 landmarks 36 .The ensemble nature of RF allows it to capture both linear and non-linear relationships in the data, making it robust and accurate for shape classification tasks.The success of RF in morphological identification 35,37,38 has encouraged this study to compare the effectiveness of this classifier in the classification of the shrew species based on the FDGM framework.Generalised linear models (GLM) serve as extensions of linear models, enabling the accommodation of nonlinearity and non-constant variance within the data.Consequently, GLMs are equipped to handle various data distributions, making them well-suited for analysing species-habitat relationships which often deviate from normal distributions 39 .

Shrew skull image acquisition
The skulls of C. malayana, C. monticola, and S. murinus can be viewed from different angles, i.e., dorsal, jaws, and lateral depending on the shape of the specimen (Fig. 1).A total of 89 specimens of three different shrew species (C.malayana, C. monticola, S. murinus) were retrieved from the Museum of Zoology, Universiti Malaya (UM), Kuala Lumpur, Malaysia.All the skulls extracted from each specimen were separately placed in small bottles for geometric morphometrics.Skull digital images were captured using Nikon D90 with 15× magnification and stored in the Tagged Image File Format (tiff) format with a resolution of 4288 × 2848 pixels.Adobe Photoshop CS6 was also used to improve the image quality 40 .

Landmark data acquisition
After the images are acquired, we used TPSUtil32 to obtain the TPS files for all three views.These files were then used in TPSDig2 for landmarking.Replicates were generated by digitising landmarks on each of the shrew images three times.This method ensured consistency and reproducibility by having the same observer capture the landmarks on three separate occasions.By comparing these replicated landmarks, any variation or errors introduced by the observer during the process could be quantified and assessed.The average of these replicated landmarks was subsequently utilised for further analysis.
We analysed the dorsal, jaw, and lateral views of each specimen, with a total number of 25, 50, and 47 landmarks respectively, being sequentially placed on each specimen.These landmarks, consisting of both Type I and Type III collectively form the landmark configuration for each view.
For the dorsal view, 25 landmarks were placed including 16 Type I landmarks (LM1, LM4-LM11, LM13-LM15, LM22-LM25) and 9 Type III landmarks (SLM2-SLM4, SLM12, SLM16-SLM21).Similarly, in the jaw view, 50 landmarks were positioned, comprising 32 Type I landmarks (LM1, LM3-LM22, LM24-LM26, LM32-LM35, LM41-LM43, LM48, LM50) and 18 Type III landmarks (SLM2, SLM23, SLM27-SLM31, SLM36-SLM40, SLM44-SLM47, SLM49).As suggested by MacLeod (2013), the application of any specific treatment to semi-landmarks, such as the sliding landmark analysis for geometric morphometric analysis has been refrained from this study.This is to prevent any alteration of the original geometric relationships which would complicate the interpretation of the results 41,42 .Our approach aligns with previous craniodental studies on shrews, which similarly avoided specific treatments for semi-landmarks.For instance, White and Searle examined the correlation between genetic diversity and fluctuating asymmetry (FA) in mandibles from island and mainland populations of common shrews on the west coast of Scotland using GM analysis 43 .Similarly, Quintela et al. investigated the geographic variation in skull size and shape among populations of the swamp rat Scapteromys tumidus across eight geographic clusters in southern Brazil, utilizing dorsal, ventral, and lateral views of the skull 44 .
The statistical analysis of three views was performed in R version 4.2.1.To use the GM data, the raw coordinates obtained from the landmarks of all three craniodental views were processed using generalised Procrustes analysis (GPA) for optimal registration using translation, rotation, and scaling using the gpagen function in the geomorph package 45 .Outline methods can produce useful and valid results when suitably constrained by landmarks 46 .This leads to the main idea of this work to incorporate the FDGM approach to observe the separation among the three shrew species.

Functional data geometric morphometrics
Functional data analysis (FDA) is a methodology employed to examine raw data exhibiting dynamic patterns over time, space, or other intricate dimensions.In this study, the sampling point under consideration comprises landmarks from three craniodental perspectives of shrews, represented as functional forms.The framework has been summarised as follows: In this study, the sampling point under consideration comprises landmarks from three craniodental perspectives of shrews, represented as functional forms.Each sampling point is vector-valued as two spatial coordinates which are the x and y− coordinates are involved. Let where k = 1, . . ., n be the standardised landmark coordinates for n specimens and, τ 1 , . . ., τ N are the observed landmark points with N being the number of landmarks on a dimensional domain (Happ-Kurz, 2020) based on the crania of the shrews (Figs.2a, 3a, 4a), which will be discussed in detail in Section "Methodology".Let (x k,τ 1 , . . ., x k,τ N ) T and (y k,τ 1 , . . ., y k,τ N ) T where k = 1, . . ., n , be the separated standardised landmarks for n specimens for the x and y− coordinates, respectively.The raw data is then converted into functions to implement functional data in an object-oriented way.For example, by using the sampling points (boundaries) τ 1 , . . ., τ N and the set of discrete raw data (values of landmarks) for the x− coordinates, a univariate functional data sample, {X (1) 2c, 3c, 4c).This is also done for the y− coordinates to construct functional samples, {Y (1) The FDGM methodology which is proposed in this work involves multivariate PCA and PCA score-based classification on the landmark coordinates of the craniodental views of shrew specimens.

Multivariate functional principal component analysis for craniodental views of shrew specimens
The two univariate functional datasets compose multivariate functional data with n outlines, each yielding a vector of n sampling points (landmarks) defined as a d− dimensional functional domain.We used the MFPCA package to compute the MFPCA estimates on the multivariate functional data based on their univariate counterparts 47 .Multivariate functional principal component analysis uses the multivariate functional data obtained from landmarks that are independently and identically distributed.The PCA basis functions are estimated from the multivariate functional data, denoted by X .These basis functions were then applied on n specimens of X based on the PACE (PCA through the conditional expectation) approach 48 .
Let X = (X 1 , X 2 ) T be a vector-valued stochastic process that correspond to the functional random variables related to the standardised landmark, x and y− coordinates respectively.Given the sample of n i.i.d sampling points X (1) , ..., X (n) of X, the estimation procedure for MFPCA involves the following steps: 1.For each element X p , estimate a univariate FPCA based on the sampling points X p by estimating the variance-covariance function K p (•, •) .This results in the estimated eigenfunctions φ p,j , and scores ξ (i) p,j , i = 1, ..., n, j = 1, ..., J p where J p is the number of eigenvalues.www.nature.com/scientificreports/ 2. Define the matrix Ξ ∈ R n×J with J = 2 p=1 J p , where each row (ψ 2,J y ) contains the estimated scores for the 2 components of the i-th sampling point.Consider the matrix Z ∈ R J×J consisting of blocks Z (pq) ∈ R J p ×J q with entries An estimate Z ∈ R J×J of the matrix Z is given by 3. Perform a matrix eigen-analysis for Z resulting in eigenvalues j and construct the orthonormal eigenvectors v j .4. Calculate the elements of the estimated multivariate eigenfunctions, φ p,j t p ( t p is any landmark points for p dimensions) and corresponding multivariate scores, ξ (i) j .
These estimated eigenvalue functions are derived under the assumption of a finite sample of size n and a finite Karhunen-Loève representation for each univariate function X p .They are relevant in practice with an appropriate choice of truncation orders.

Functional linear discriminant analysis for craniodental views of shrew specimens
We applied the multivariate scores, ξ (i) j , i = 1, . . ., n; j = 1, . . ., K where K is the truncated number of components obtained from the landmarks in FLDA to distinguish among the categories studied.The results were compared with the findings observed when PCA is used in linear discriminant analysis (LDA), based on the GM approach.This unsupervised learning approach is a dimensionality reduction technique that is often used to model differences in groups.FLDA provides a possibility to be an efficient tool to improve classification.
The first three (K = 3) PC scores were used in LDA for GM and FDGM to compare rates of classification to achieve low classification errors despite a major data reduction 49 .Functional linear discriminant analysis (FLDA) uses a spline curve which is parameterised using a basis function multiplied by a d− dimensional coefficient vector to effectively transform the data into a single d− dimensional space 50 .The classifier also includes the random error to model sampling points from each individual 50 .
The coefficient vector is then modeled using a Gaussian distribution with a common covariance matrix for all classes by analogy with LDA 50 .The observed curves can then be pooled to estimate the covariance and mean for each class which makes it possible to form accurate estimates for each curve based on only a few sampling points 50 .
Let M be the set of classes with Q denoted as the covariance matrix of the variables centered on the class means and B be predictions by the class means 51 .Let H be the M × W matrix of class means where W ≥ 2 represents the categorical variables.Denote G to be the n × M matrix of class indicator variables.Thus, the predictions are GH .The sample covariance matrices are as follows.
Linear discriminant analysis maximises the ratio of the separation of the class means to the within-class variance by maximizing the ratio a T Ba a T Wa where a is the eigenvector of B, corresponding to the largest eigenvalue 52 .

Classification models
In addition to FLDA, we applied classification methods such as NB, SVM, RF, and GLM to enhance the classification of species among the shrews with the aid of the functional principal component scores (MFPC scores) and PCA scores using the GM method to reduce time complexity.This was done using the e1071, MASS, and caret packages in R. The combined analysis of all three views and each separate view was performed.For each view, the data set is split into training and test samples (70:30) and this procedure is iterated 20 times.For each model, the learning is based on the training sample while the performance is assessed by the accuracy based on the test sample.The accuracy and the standard deviation of the 20 iterations are tabulated in Table 1.
Brief descriptions of the classification models used on the separate views as well as the combination of all three views are as follows: (i) Naïve Bayes The naïve Bayes (NB) classification model is a classifier used to estimate the posterior probability to provide a mechanism that utilises predictors of the training data 53 .Based on the MFPC scores obtained from this study, the Bayes theorem can be written as follows: Z (pq) jk = Cov ψ p,j , ψ q,k , j = 1, . . ., J p , k = 1, . . ., J q , p, q = 1,2.(ii) Support Vector Machine Support Vector Machine (SVM) addresses a multi-class problem as a single "all-together" optimization.This classifier can be used to find a hyperplane in a 2-dimensional space that will separate the scores to their potential species.As this study emphasises 2D, thus the equation of the hyperplane in the two domains can be given as follows: where w i = vectors of the first three MFPC scores, b = biased term ( w 0 ), X = variables.
The three main hyperparameters in SVM are the cost parameter (C), gamma ( γ ), and kernel.The cost (C) is the penalty parameter of the error term which controls the trade-off between achieving a low training error and a low testing error.The gamma ( γ ) hyperparameter defines the influence of individual training samples and the kernel is used for mapping the input data into a higher-dimensional space.The radial basis function (RBF) is selected as the kernel function in this study due to its strong classification approach and its versatility in application without requiring prior knowledge of the dataset 54 .SVM-RBF can be defined as follows: where γ > 0, γ = 1 2σ 2 .(iii) Random Forest Random forests (RF) is an algorithm for classification developed by Breiman (2001) 36,55 that is based on bootstrap aggregating or bagging that combines the predictions of multiple decision trees to make a final prediction.This helps to reduce the variance of the individual trees, therefore reducing the overall expected prediction error of the forest.The working algorithm of the RF classifier is as follows: The GLM classifier here is based on the elastic net penalty, which combines both L1 (LASSO) and L2 (ridge) penalties.In the context of geometric morphometrics, elastic net regularisation can be applied to GLMs to control the complexity of the model and prevent overfitting when analysing shape data.In GLM, parameters are assigned to control the L1 and L2 regularisations as well as the strengths of these regularisations.This classifier is based on the MFPC scores which can be represented as 3 with a link function that describes how the mean, E (Y i ) = µ i depends on the linear predictor, g(µ i ) = η i .The GLM classifier also has a variance function that describes how the variance, var(Y i ) depends on the mean, var(Y i ) = φvar(µ i ) where the dispersion parameter, φ is a constant.

Results
Multivariate functional principal component (MFPCA) using the functional data of all views combined gave a total of 30 eigenvalues.The first two MFPCs accounted for 82.61% of the total variation in the species of shrews.PCA using the GM method yields 88 principal components, where the first two PCs explained 64.12%.The functional principal components show a clearer separation (Fig. 5).Although S. murinus does seem to be well separated in the GM method, the approach could not clearly distinguish the other two shrew species.
Thus, employing the FDGM approach has the potential in examining the species variation of the shrews.When PCA is separately conducted on each view, the dorsal view gives the best separation for the three shrew species compared to the other two views for both GM and FDGM methods (Fig. 5).
The dorsal view yielded a total of 9 MFPCs and the first two MFPCs explained 82.44% of the variation among the species.The GM method yields 46 PCs and the first two explained 59.02% of variation.The MFPCA results gave a better separation among the three shrew species compared to the GM method.
There are 10 MFPCs for the jaw view where the first two MFPCs explained 91.41% of the variation in the species.There is a total of 88 classical PCs for the jaw view where the first two explained 77.73% of the variation.As for the lateral view, there is a total of 10 MFPCs and the total variation in species explained by the first two MFPCs is 90.90%.Out of 88 PCs, the first 2 PCs of the GM approach for lateral view explained 78.00% of the total variation.Although S. murinus is somewhat separated, the jaw view and lateral view show poor separation for all three species for the GM approach (Fig. 5).A slight improvement in species separation can be observed in the FDGM approach for both views.The performance of the classification models based on individual craniodental views and the combination of all three is evaluated using the first three PC scores of both the FDGM and GM approaches as the first three PCs of all the craniodental views lie within the general rule of thumb threshold of 80% in the FDGM approach.The overall improvement in results for all the classification models when the FDGM approach is applied compared to the GM method is shown in Table 1.
The first three PC scores from GM and FDGM were then used in LDA to observe the percentage of separation among the three shrew species based on the craniodental views.Based on GM, the percentage of separations achieved by the first discriminant function is 87.15% and the second discriminant function is 12.85% when all three craniodental views are combined.It is noticeable that the groups are quite well separated with FDGM showing better separation among the three species (Fig. 6) compared to the GM method.The percentage of separations achieved by the first discriminant function in FDGM is comparable to GM, which is 99.9%.
Based on the results obtained in FLDA when the three craniodental views are observed separately, the dorsal view showed a distinct separation of S. murinus compared to the other two shrew species, which overlapped (Fig. 6).The clusters among the three species are more compact when FLDA is used compared to the GM method.Based on GM, the percentage of separations achieved by the first discriminant function is 89.15%, 97.93%, and 88.40% for dorsal, jaw, and lateral respectively.The percentages of separation by the first discriminant function showed improvement in the FDGM method, which is 99.92% for the dorsal, 99.9% for the jaw view, and 97.38% for the lateral view.C. monticola seems to be well grouped in the FDGM method for all views.
An additional case study, featuring a smaller number of landmarks with a large sample size, is provided in the supplemental materials to showcase the use of the FDGM framework.

Discussion
compared the FDGM and GM approaches to study the classification of S.murinus, C. monticola, and C. malayana using craniodental landmarks extracted from the skull images of shrews.Distinct clusters of the shrew species are highlighted when the standardised landmarks of the three craniodental views combined are analysed by FDGM and GM methods.Functional data geometric morphometrics is a better solution as the outlines of the skulls are treated as continuous curves rather than discrete points 56 .High-dimensional data can slow down traditional statistical algorithms which can lead to challenges for standard classification methods when handling such data.Dimension reduction techniques are applied to retain relevant information and reduce correlations, speeding up subsequent analyses while improving accuracy.Functional principal component analysis (FPCA) is commonly used for this purpose and allows for the exploration of data variability in individual curve shapes 18 .As shown in Fig. 5, PCA based on GM does not give a better separation of the shrew species compared to MFPCA of the FDGM approach; rather, it shows comparable results.When the three craniodental views were individually examined (Fig. 5), the dorsal view showed the clearest separation among the three shrew species using both approaches.This is because the dorsal view gives the most comprehensive view of the skull which Figure 6.LDA plot using GM method and FLDA plot using FDGM method for all combined craniodental views, dorsal view, jaw view, and lateral view.includes landmarks all the major cranial features.Based on the results obtained, this study reveals that the dorsal view of the shrew skulls can be the most informative view for distinguishing between the three shrew species.
The least favorable separations are observed from the jaw view although MFPCA of the FDGM approach shows some improvement and is comparable to the GM method.This may be due to the similarities in C. monticola and C. malayana as they belong to the same genus.The edges of the molar region tend to be similar for both species.The horseshoe effect present in the GM approach may indicate species turnover along environment gradients 57 .This effect has been commonly observed in ecological ordination obtained by PCA using the GM method 58 .The plots of the MFPCA scores reveal the presence of functional manifolds where the horseshoe effect is noticed 59 .The lateral view also indicates an overlap between the two species and comparable results for both methods.This can be due to the similarity of the back curvature between the two as the region tends to be flat and a little sharp for S. murinus.This demonstrates that FDGM can face difficulties in precisely capturing complex and non-linear shape alterations.This is primarily due to the intricate nature of shape transformations in biological structures, which can be influenced by a multitude of factors including genetic variation, developmental processes, and environmental influences.
Considering that the FDGM study relies on functions of craniodental curves based on landmarks, an improvement compared to GM in classification performance for all four models is evident.The dorsal view gives the best rate of classification accuracy among the three views.
Functional data geometric morphometrics integrates principles from FDA with GM methodologies.It treats landmark data as functions, allowing for the analysis of dynamic shape changes over continuous domains.By representing the curves of the three craniodental views as functional forms, FDGM captures the inherent variability and intricacies of morphological changes more comprehensively than discrete landmark-based GM methods.The results obtained from FDGM provide comparable results with the GM methods in analysing dynamic shape changes, which may be challenging for static landmark-based GM methods.In addition, the FDGM framework has the potential to accommodate irregularly sampled or noisy data more effectively, as it does not rely on fixed landmark configurations.However, one potential drawback of this framework would be the requirement of dense and well-sampled data to accurately capture shape dynamics, which may not always be feasible.

Concluding remarks
In this study, we proposed the use of FDGM on landmark data to represent the shapes of the dorsal, lateral, and jaw of shrew skulls.Results confirm that FDGM improves classification among the three species compared to the GM approach.Particularly, the dorsal view emerges as the best representation for classifying the species in both approaches.The proposed approach utilises data smoothing to represent landmark coordinates as a function derived from raw data, enhancing pattern clarity, and making it a promising tool in morphometrics research.However, FDGM may encounter challenges in accurately capturing complex and non-linear shape transformations.This is because biological structures often exhibit complex shape transformations influenced by a myriad of factors, such as genetic variation, developmental processes, and environmental influences.Capturing these complex shape variations accurately with FDGM may require more sophisticated modeling techniques and larger, more diverse datasets.Additionally, integrating FDA techniques with GM requires careful data preprocessing and analytical methods to mitigate biases or errors.Despite these challenges, FDGM offers a more sophisticated approach to analysing shape variation by modeling shape changes as continuous functions.This departure from traditional discrete landmark-based methods allows for a more comprehensive representation of shape, capturing subtle variations and non-linear transformations more effectively.By exploring the theoretical and practical advancements offered by FDGM, this study aims to contribute to the methodological toolkit of GM and facilitate more accurate and insightful analyses of biological shape data.Additionally, FDGM integrates principles from functional data analysis with geometric morphometrics, providing a more robust framework for analysing shape data.Practically, FDGM enhances the accuracy and sensitivity of shape analysis by enabling the examination of shape changes along continuous curves or surfaces.This can lead to more precise identification of shape differences between groups and a better understanding of shape variation within populations.Future studies can address these challenges and explore the potential of FDGM further.The aim of this study was to assess how FDA can be used to enhance GM.As a result, we avoided specific treatments such as sliding landmark analysis on semi-landmarks in GM.This approach may pave the way for future research to focus on analyses that differentiate between landmarks and semi-landmarks.Additionally, ongoing research on three-dimensional FDGM extensions holds promise for further enhancing morphometrics analysis. https://doi.org/10.1038/s41598-024-66246-z

Figure 1 .
Figure 1.Digital skull images of dorsal, jaw and ventral views of C. malayana, C. monticola, and S. murinus.

Figure 2 .
Figure 2. (a) 25 landmarks included for dorsal view of C. malayana.Landmarks and semilandmarks are represented by red and light blue dots, respectively (b) Dimension 1 of converted functional data of the landmark data for the dorsal view using the FDGM method (black lines represent specimens).(c) Dimension 2 of converted functional data of the landmark data for the dorsal view using the FDGM method (black lines represent specimens).

Figure 3 .
Figure 3. (a) 50 landmarks included for jaw view of C. malayana.Landmarks and semilandmarks are represented by red and light blue dots, respectively.(b) Dimension 1 of converted functional data of the landmark data for the jaw view using the FDGM method (black lines represent specimens) (c) Dimension 2 of converted functional data of the landmark data for the jaw view using the FDGM method (black lines represent specimens).

Figure 4 .
Figure 4. (a) 47 landmarks included for the lateral view of C. malayana.Landmarks and semilandmarks are represented by red and light blue dots, respectively.(b) 2D representation of the x and y− coordinates for the 47 landmarks of crania for the lateral view; (c) Dimension 1 of converted functional data of the landmark data for the lateral view using the FDGM method (black lines represent specimens).(d) Dimension 2 of converted functional data of the landmark data for the lateral view using the FDGM method (black lines represent specimens).

Figure 5 .
Figure 5. PCA plot using GM method and MFPCA plot using FDGM method for all combined craniodental views, dorsal view, jaw view, and lateral view.

Table 1 .
The mean accuracy on the test sample and the corresponding standard deviations (in brackets) based on 20 replications using the MFPCA and PCA scores for (a) dorsal, jaw, and lateral combined; and (b) individual views.